The Convolutional Neural Network (CNN) has been used in many fields and has achieved\nremarkable results, such as image classification, face detection, and speech recognition. Compared\nto GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based\nCNN accelerator has great advantages due to its low power consumption and reconfigurable\nproperty. However, FPGAâ??s extremely limited resources and CNNâ??s huge amount of parameters and\ncomputational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous\nplatform and the coordination of resource and bandwidth issues with the roofline model, the CNN\naccelerator we designed can accelerate both standard convolution and depthwise separable\nconvolution with a high hardware resource rate. The accelerator can handle network layers of different\nscales through parameter configuration and maximizes bandwidth and achieves full pipelined by\nusing a data stream interface and ping-pong on-chip cache. The experimental results show that\nthe accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can\nalso accelerate depthwise separable convolution, which has obvious advantages compared with\nother designs.
Loading....